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Abstract 

Catechins are the most important bioactive compounds in tea, and have been demonstrated to possess a wide variety of 
pharmacological activities. To characterize quantitative trait loci (QTLs) for catechins content in the tender shoots of tea 
plant, we constructed a moderately saturated genetic map using 406 simple sequence repeat (SSR) markers, based on a 
pseudo-testcross population of 183 individuals derived from an intraspecific cross of two Camellia sinensis varieties with 
diverse catechins composition. The map consisted of fifteen linkage groups (LGs), corresponding to the haploid 
chromosome number of tea plant (2n = 2x = 30). The total map length was 1,143.5 cM, with an average locus spacing of 
2.9 cM. A total of 25 QTLs associated with catechins content were identified over two measurement years. Of these, nine 
stable QTLs were validated across years, and clustered into four main chromosome regions on LG03, LGl 1, LG12 and LG15. 
The population variability explained by each QTL was predominantly at moderate-to-high levels and ranged from 2.4% to 
71 .0%, with an average of 1 7.7%. The total number of QTL for each trait varied from four to eight, while the total population 
variability explained by all QTLs for a trait ranged between 38.4% and 79.7%. This is the first report on the identification of 
QTL for catechins content in tea plant. The results of this study provide a foundation for further cloning and functional 
characterization of catechin QTLs for utilization in improvement of tea plant. 
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introduction 

Tea, one of the most widely consumed non-alcoholic beverages 
in the world, is an infusion of the cured leaves of tea plant (Camellia 
sinensis (L.) O. Kuntze), which is grown primarily in the tropical 
and subtropical regions of Asia, Africa, and Latin America. 
According to the International Tea Committee (ITC), over 4.21 
million tons of made tea were produced in 20 1 1 , of which more 
than 84% came from Asia [1]. The number of tea plant grown in 
China, the world's largest tea producer, has increased to around 
2.11 million hectares with an annual production of about 1.62 
million tons, contributing nearly 37.8% of the world production, 
followed by India (23.0"/o), Kenya (8.8%), Sri Lanka (7.6%), and 
Vietnam (4.1%) [1]. 

Tea recently has attracted worldwide attention for its multiple 
health-promoting efTects, especially in regard to its potential for 
chemoprevention of cancer, cardiovascular, and neurological 
diseases [2-5]. Increasing scientific and customer interest in the 
health benefits of tea has led to the increased consumption of tea 
products, particularly with respect to tea extracts, which can be 
used as a featured ingredient in a range of food, beverage, and 
cosmetic products [6,7]. Tea contains a number of bioactive 
compounds, such as the flavonoids, caffeine, and L-theanine. 



Overall, the major constituents in tea extracts are catechins, a 
group of flavonoids, which contribute up to 30% of the dry weight 
of the tender tea shoots ("two and a bud", i.e. one apical bud and 
two adjacent leaves) [8]. Catechins in green tea leaves consist 
mainly of four flavan-3-ols, viz. (-)-epicatechin (EC), (-)-epicatechin 
gallate (ECG), (-)-epigallocatechin (EGC), and (-)-epigallocatechin 
gallate (EGCG) [9]. Of these, EGCG is the most abundant, 
accounting for up to 70% of the total tea catechins [9], and have 
been demonstrated to possess a wide variety of pharmacological 
activities including antioxidant and radical-scavenging activity, 
which is thought to be an underlying protective mechanism 
primarily responsible for the health benefits of tea [2,10,11]. 
Therefore, the issue of improving catechins content and compo- 
sition has become increasingly important in tea plant breeding 
programs. 

The tea plant is a dicotyledonous, perennial, and woody plant. 
Its diploid genome consists of 15 chromosome pairs, with an 
estimated size of ~4.0 Giga bases [12]. There are two main 
cultivated varieties of tea plant, i.e. the C. sinensis var. sinensis, which 
is predominantly grown in China, Japan and Turkey, where most 
of the harvest is used to produce green tea, and the C.sinensis var, 
assamica (Masters) Kitamura, which is cultivated worldwide and 
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usually used for black tea production due to its high content of tea 
polyphenols. The tea plant is highly heterozygous, generally self- 
incompatible, and its juvenile stage can span up to 4-5 years. 
Traditional approaches for genetic imprcnement in tea plant is 
therefore extremely difficult and time-consuming, especially for 
complex traits such as catechins content, which are controlled by 
multiple genes known as quantitative trait loci (QTLs). 

Linkage-based QTL mapping is a powerful tool for elucidating 
the molecular basis of complex traits, facilitating identification of 
specific genes responsible for particular genetic traits and 
estimation of gene actions and genetic parameters. And it can 
also provide useful insight into the development of new 
approaches to improve the efEciency and precision of conventional 
plant breeding via marker-assisted selection (MAS) [13,14]. A 
number of QTLs for important agronomic traits have been 
detected and characterized in grain crops (e.g. rice [15], maize 
[16]), vegetables and fruits (e.g. tomato [17], eggplant [18]). 
However, the application of QTL mapping in tea plant is difKcult. 
One reason is that the natural features of self-incompatibility and 
low seed yield of tea plant make it difficult to cjx'ate suitable 
mapping populations, and another one is the lack of a\ ailable 
genetic markers. To the best of our knowledge, there have been 
only seven reported cases of linkage maps and one case of mapping 
QTLs of yield related traits in tea plant [19-26]. The catechins 
content, though being major determinants of tea quality, have not 
yet been subjected to QTL analysis. 

SSR, also known as microsateUite, is one of the most popular 
and versatile marker type used in plant genetic mapping studies, 
due to its desirable features such as high abundance, locus 
specificity, codominant inheritance, high polymorphism informa- 
tion content, and reproducibility [27]. According to the origins, 
SSRs are divided into two categories: genomic SSRs or genie SSRs 
(EST-SSRs). The development of genomic SSR markers has 
traditionally been a difficult, costiy, and time-consuming process, 
which involves the construction of genomic library enriched for 
specific repeat motifs [28]. To date, only a limited number of 
validated genomic SSR markers (<100) are available for tea plant 
[29-33]. In contrast, the genie SSRs can easily be developed by 
screening the collection of clustered ESTs in publicly available 
databases. Furthermore, the genie SSRs are derived from the 
expressed regions of the genome, and thus have increased 
potentials for tagging and mapping of genes and QTLs [27]. 
The genie SSRs are also highly transferable, because their flanking 
sequences are more likely to be conserved in related species, and 
therefore can be used as anchor markers for comparative 
mapping. The approximate total number of genie SSR markers 
available in tea plant has been increased to 900 thus far [25,34— 
43]. However, it is still insufficient for constructing a saturated 
SSR-based genetic map of tea plant. 

Recendy, we obtained approximately 57 million RNA-Seq 
reads by deep sequencing of the tea plant transcriptome, using the 
lUumina sequencing platform [44] . These sequence data provide a 
good resource for the development of genie SSR markers. Thus 
the aim of the current study were to: (1) develop a set of novel 
genie SSR markers for tea plant, (2) construct a SSR-based genetic 
linkage map using a pseudo-testcross population generated from 
an inter-varietal cross of C. sinensis, and (3) identify QTLs 
controlling catechins content in the tender shoots for the marker 
assisted selection of tea plant in the near future. 



Materials and Methods 

Mapping population and DNA isolation 

A pseudo-testcross population consisting of 300 individuals was 
developed by crossing two tea plant accessions chosen from the 
China National Germplasm Tea Repository (CNGTR) in the 
TRICAAS located at Hangzhou, Zhejiang, China. The maternal 
parent, 'Yingshuang' (hereinafter 'YS'), is an improved cultivar 
derived from the cross of two major Chinese cultivar, C. sinensis 
var. sinensis cv. Fuding Dabaicha (hereinafter 'FD') and C. sinensis var. 
assamica cv. Yunnan Dajecha (hereinafter 'YD'). The paternal parent, 
'Beiyue Danzhu' (hereinafter 'BD'), is a landrace of C. sinensis var. 
pubilimha Chang originally collected from Southwest China. The 
cultivar 'YS' has an early-sprouting growth habit, medium-sized 
leaves, and moderate catechins content in the tender shoots, as 
compared to 'BD' which has a late-sprouting growth habit, large- 
sized leaves, and higher catechins content in the tender shoots. 
The entire population and both parents were grown at the 
TRICAAS Experimental Station. A subset of 183 individuals was 
selected for genetic mapping and QTL analysis. DNA was 
extracted from young leaves and buds of each plant using a 
CTAB method with minor modifications [45] . DNA quality was 
assessed by electrophoresing in a 0.8% TBE-agarose gel, stained 
with ethidium bromide. DNA concentration was determined 
utilizing UV/Vis spectroscopy, and then adjusted to 10 ng/mL for 
SSR analysis. 

SSR mining and primer design 

A set of unigene sequences generated from the transcriptome 
assembly of tea plant in our previous studies [44] w(;r(; searcluJ 
for SSRs using the MISA software (http://pgrc.ipk-gatersleben. 
de/misa; [46]). For di-, tri-, tetra-, penta-, hexa- and higher-order 
nucleotide motifs, the minimum numbers of repeats were defined 
as 8, 5, 4, 3, 3 and 3, respectively. Primer pairs were designed 
based on thi; flanking scrjuences of each SSR using PrimerS 
(http://bioinfo.ut. ee/primer3-0.4.0/primer3/) with threshold cri- 
teria of 18-24 bp primer length, 40-60% GC content, and an 
estimated amplicon size of 100-300 bp. 

SSR screening and genotyping 

A total of 1,509 SSR primer pairs, including 1,141 newly 
developed and 368 previously reported in different Camellia 
species, were initially screened for polymorphism between the 
mapping parents. Those primer pairs that successfully amplified a 
single locus, which was polymorphic in at least one of the parents, 
were subsequentiy used to determine the genotype of F] progeny. 
PGR mixture consisted of the following in 10 (iL total \'olume: 
10 ng genomic DNA, 10 mM dNTPs, 10 \lM of each primer, 
0.5 U Taq polymerase (Beijing Dingguo Biotech, Beijing, China) 
and 10 X PGR buffer supplied together with the enzyme. 
Amplffication was performed according to the method of Zhao 
et al. [35] using the following reaction conditions: 4 min at 94°C 
for initial denaturation, followed by 35 cycles of 94°G for 30 s, 
annealing temperature for 30 s, 72°C for 30 s, with a final 
extension at 72°C for 7 min. PGR products were separated in 
10% polyacrylamide gels and detected by silver staining [47]. 

Linkage map construction 

The genetic map was constructed using a pseudo-testcross 
mapping strategy as described by Grattapaglia and Sederoff [48] . 
Linkage analyses were performed using the JoinMap 4 software 
with a cross-pollinator (CP) population type [49]. Chi-square 
statistics were calculated for each marker to assess the segregation 
deviation (SD) from the expected Mendelian ratios. Both distorted 
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and non-distorted markers were; used for linkage mapping. A 
framework linkage map was primarily constructed based on non- 
distorted markers, using a "One-step method" as described by 
Rabbi et al. [50]. In brief, a logarithm of odds ratio (LOD) score of 
4.0 was used to assign markers to linkage groups (LGs), in which 
the ordering of markers was determin(;d using regression mapping 
with default parameters. Then, the distorted markers were 
separately added onto the linkage map, except the ones gready 
affecting the order of their neighbor markers or excessively 
changing linkage distance. The Kosambi function was used for the 
estimation of map distances. The graphical maps were generated 
using MapChart version 2.1 [51]. 

Catechins extraction and identification 

In the spring of 2010 and 2011, the "two and a bud" tender 
shoots for each plant were immediately collected and steamed at 
120°C for 5 min, and then dried to constant weight at 80°C. The 
isolation and detection of tea catechins, by high performance 
liquid chromatography (HPLC), was performed according to the 
International Standards Organization (ISO) ISO 14502-2-2005(E) 
procedure [52] with minor modifications. In brief, the processed 
tea shoots were ground into a fine powder, and a sample of 
0.2±0.0001 g of the powder was subsequently extracted twice 
with 5 mL of 70% methanol by heating at 70°C in water bath for 
10 min, followed by centrifuging at 3500 rpm for 10 min. The 
extracts were combined and made up to 10 mL with 70% 
methanol, and then 1 mL volumes of the extracts were diluted to 
5 mL with stabilizaing solution (10'X> v/v acetonitrile, 500 |J.g/mL 
EDTA and ascorbic acid). Afterward the dilutions were filtered 
through 0.45 |J,m nylon filters, and 10 |a,L were applied to HPLC- 
UVD (Waters 2695-2489) for catechins quantification. HPLC was 
carried out on a SunFire Cig column (5 |a.m, 4.6x250 mm) at 
35°C. A binary- gradient elution system was adopt with mobile 
phase consisting of solvent A ( 1 % v/ v formic acid) and solvent B 
(acetonitrile). The elution profile was as follows: 93.5 to 85% A 
from 0 to 16 min, 85 to 75% A from 16 to 20 min, 75.0 to 93.5% 
A from 20 to 26 min, and held at 93.5% from 26 to 30 min to 
equilibrate the column for the next injection. The flow rate was 
maintained at 1 mL/ min, and the detection wavelength was set at 
278 nm. Individual catechins were identified based on retention 
times of unknown peaks in comparison with those of peaks 
obtained from catechin standards, and the calibration curves for 
quantification were generated for each catechin by linear 
regression of the standard peak areas versus the standard 
concentrations in mg/mL (Figure SI). 

Statistical analysis and QTL mapping 

Statistical analysis of phenotypic data was conducted using the 
SAS software (SAS Institute Inc., Gary, North Carolina). For each 
trait, the mean and standard deviation of each plant was 
calculated, and the significance of differences between parental 
values was analyzed using the Mann-Whitney U test. Pearson's 
correlation coefficients for pairwise trait combinations were also 
calculated. QTL mapping was performed using the MapQTL 4.0 
software, employing the single-QTL model (i.e. interval mapping, 
IM) in combination with the restricted multiple QTL model 
(rMQAI) [53] . In brief, the IM was performed to detect QTLs, and 
then the rMQlM was used to refine QTL positions. In rMQM 
mapping, the marker with the highest LOD score was selected as a 
cofactor, and multiple rounds of rMQM were performed until the 
cofactors selected were stable. The LOD score threshold for QTL 
declaration was determined using the permutation test (1,000 
replications) at a genome-wide level of 5%. QTL were designated 
for peaks that reached significance, and QTL regions were 1- and 



2-LOD support intervals. QTL for each trait across different years 
were declared to be stable if their confidence intervals overlapped. 
A QTL was classified as major/minor based on 10% of explained 
population variability (PVE). The additive (or average allele 
substitution) effects from maternal parent 'YS' (al) and paternal 
par(;rit 'BD' (a2), and dominance effects of each QTL were 
estimated by the model of Knott et al. [54] as used by Qin et al. 
[55]. The maternal parent 'YS' was derived from the cross of 'FD' 
and 'YD', therefore the QTL with a positive and negative al 
meant that the allele for increasing catechins content was 
contributed by TD' and 'YD', respciti\'cly. The a2 was not 
further analyzed due to a lack of required information about the 
parents of 'BD'. 

Results 

Novel genic SSRs 

A total of 59,962 unigenes derived from the transcriptome 
assembly of tea plant were randomly selected and used for SSR 
mining. The results showed that there were overall 7,589 SSR- 
contained unigenes, representing a total number of 9,239 genic 
SSRs. The estimated distance between SSRs was 3.98 kb on 
average, corresponding to one SSR for every 12.7 unigenes. 
Dinucleotide was the most common repeat unit (38.9%), followed 
by tri- (27.2%), hexa- (17.2%), penta- (10.7%), tetra- (4.4%), and 
higher-order repeats (1.6%)(Table 1). Almost half of these unigenes 
were not suitable for marker development, because their SSR- 
flanking sequences were too short for designing PCR primers. 
Consequentiy, a total of 4,713 primer pairs were successfully 
designed, and 1,141 of them were selected for further analysis. 

Marker polymorphism 

Out of the 1,509 genic and genomic SSR primers screened, 450 
(29.8%) exhibited reproducible amplification and distinct poly- 
morphism between the mapping parents (Table 2). The level of 
polymorphism was significantiy higher for genic SSR markers 
(30.6%) compared with genomic SSR markers (6.l"/o). The female 
parent 'YS' was less heterozygous than the male parent 'BD'. Of 
the 450 informative marker loci, 198 segregated in both parents 
(44.0%): 13 with four segregating alleles, 127 with three alleles, 58 
with two alleles; and in parallel, 74 segregated only in 'YS' (16.4%) 
while 178 only in 'BD' (39.6%). Primer sequences and character- 
istics of the newly developed genic SSR markers are provided in 
Table SI. 

Segregation of the markers 

For each of the 450 informative SSR markers, genotypes were 
obtained for all 183 offspring. Chi-square analysis of genotypic 
data for the mapping population revealed that 320 (71.1%) loci 
fitted the expected Mendelian ratio, while 42 (9.3%) loci exhibited 
slight SD (0.01<P<0.05) and 88 (19.6%) were severely distorted 
(P<0.01). Among the 130 skewed marker loci, 68 (52.3%) .showed 
distortion in favor of the 'YS' genotype; 15 (11.5%) showed 
distortion in the dir(;rtion of the 'BD' genotype; 11 (8.5%) were 
deviated towards the parental genotypes; and 36 (27.7%) were 
skewed towards the heterozygous genotypes (Table 3). 

Linkage map 

Genotypic data of all 450 loci were used for linkage analysis. As 
a result, 406 SSR markers were finally mapped into 15 linkage 
groups, corresponding to the haploid chromosome number of tea 
plant (Table 4; Figure 1). The linkage map covered a total genetic 
distance of 1,143.5 cM, with an average locus spacing of 2.9 cM. 
Individually, the linkage groups ranged in size from 52.4 cM 
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Table 1. Occurrence of SSRs in the transcriptome of tea plant. 



Repeat type 


Number 


Proportion (%) 


Frequency (%) 


Average distance (l(b/SSR) 


Dinucleotide 


3,595 


38.9 


6.0 


10.23 


Trinucleotide 


2,514 


27.2 


4.2 


14.62 


Tetranucleotide 


408 


4.4 


0.7 


90.14 


Pentanucleotide 


989 


10.7 


1.7 


37.18 


Hexanucleotide 


1,586 


17.2 


2.7 


23.18 


Higher-order 


147 


1.6 


0.3 


250.19 


Total 


9,239 


100 


15.6 


3.98 


doi:l 0.1 371/journal.pone.0093131 .tool 



(LG15) to 100.7 cM (LGOl and LG08), and the number of 
markers on each group varied from 1 7 for LG 11 to 43 for LGO 1 
(Table 4). LG03 possessed the highest marker density with an 
average locus distance of 2.2 cM, while the other LGs had 
relatively lower marker density (2.3-3.9 cM/SSR). The majority 
of spaces between two adjacent markers were shorter than 20 cM, 
and there was only one larger marker interval, distributing on 
LG04 (TUGMS43A-TM474) with a map distance of 29.3 cM. 

Based on the anchor markers which were mapped both in the 
current map and the reference map (pubhshed by Taniguchi et al. 
[25]), the relationships between LGs of two maps were established 
(Figure S2). Each LG contained at least one anchor locus, and the 
map position of these loci showed good coUinearity between the 
homologous LGs. The newly developed genetic map covered 94% 
of 1,212 cM of the reference map, and the lengths of homologous 
LGs of two maps were generally concordant (Table 4). 

The linkage map contained a total of 94 distorted markers 
which were marked with asterisks and denoted on the map 
according to their distortion level (Figure 1). The majority of these 
markers (68.1%) were clustered on five linkage groups (LGOl, 
LG04, LG07, LG08, and LG14), and the remaining 31.9% were 
randomly distributed on other linkage groups except for LGll, 
LG12, and LG13. Fifteen (75%) mapped markers on LG04 were 
severely distorted (P<0.001) and it was 17 (63%) for LG14. 



To test whether the genotypic SD was caused by gametic and/ 
or zygotic selection, segregation patterns of skewed markers were 
analyzed using the method as described by Li et al. [56] . In brief, 
gametic selection (allelic SD test) was demonstrated by testing the 
deviation of the observed allelic distribution from the expected 
Mendelian ratio for each heterozygous parent, and zygotic 
selection (zygotic SD test) was assessed by testing the observed 
genotypic distribution from the genotypic ratio that was expected 
given allele frequency estimates (for details see Li et al. [56]). Any 
region with at least three adjacent marker loci showing significant 
SD was considered as a candidate segregation distortion region 
(SDR), wherein the skewed markers showed similar SD patterns 
and had similar gametic and zygotic selection tests. 

Out of 94 total skewed markers, only six showed significant 
zygotic SD (P<0.05) while 85 (90.4%) were significantly distorted 
based on combination of parental allelic SD tests (Table S2). Thus 
gametic selection seemed to play a major role in contributing to 
the SD in the present mapping population. Five candidate SDRs 
were identified on five LGs [hlue-filled LG segments in Figure 1). 
The LG04 SDR contained five distorted makers, covering a map 
distance of 13.0 cM; and the SDRs on LG08 and LG14 both 
comprised four SD loci, while it was three respectively for LGOl 
and LGO 7. All the markers within five SDRs showed allelic SD but 
not zygotic SD. Of these, the markers on LG14 exhibited excess 



Table 2. Summary of marker sets and the informative SSR markers validated in the 'YS'x'BD' tea plant population. 



Type Marlter set Acronym Tested Mappable Segregation type 













Imxil 


nnxnp 


hl(xhl( 


ef xeg 


abxcd 


Genie SSR 


Newly developed*' 


TM 


1,141 


328 


62 


135 


33 


88 


10 




Taniguchi et al. [25] 


MSE/MSG 


45 


26 


4 


10 


0 


10 


2 




Jin et al. [34] 


P 


10 


4 


0 


2 


0 


2 


0 




Sharma et al. [36] 


TUGMS 


61 


15 


2 


6 


1 


5 


1 




Ma et al. [37,40] 


TM 


104 


37 


2 


9 


13 


13 


0 




Zhou et al. [38] 


TM 


59 


25 


4 


10 


6 


5 


0 




Yao et al. [41] 


TM 


40 


12 


0 


5 


5 


2 


0 


Genomic SSR 


Freeman et al. [29] 


CamsinM 


13 


1 


0 


0 


0 


1 


0 




Hung et al. [30] 


Ca 


11 


0 


0 


0 


0 


0 


0 




Chen et al. [31]" 


CN 


10 


1 


0 


0 


0 


1 


0 




Yang et al. [32]' 


A 


15 


1 


0 


1 


0 


0 


0 


Total 






1,509 


450 


74 


178 


58 


127 


13 



^Novel genie SSRs developed from the transcriptome of C. sinensis. 

"''^Genomic SSRs derived from Cnitidissima Chi and C taliensis (W. W. Sm.) Melch, respectively. 
doi:l 0.1 371/journal.pone.0093131 .t002 
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Table 3. Summary of skewed SSR markers. 





Segregation type 


Number of distorted markers 






1* II* III* 


IV* 


Total 


abxcd 




2 


2 


efxeg 


16 4 3 


19 


42 


hkxhk 


8 


15 


23 


Imxil 


n 1 




12 


nnxnp 


41 10 




51 


Total 


68 15 11 


36 


130 


* Markers exhibiting skewed genotypic frequencies toward maternal parent 'YS' 
(1), paternal parent 'BD' (II), both parents (III) and heterozygotes {IV), 
respectively. 

doi:1 0.1 371/journa!.pone.0093131 .t003 



heterozygosity, while the markers on other four LGs exhibited 
skewed genotypic frequencies toward maternal parent 'YS' (Table 

52) . 

Catechins content 

The principal components of tea catechins, including EC, ECG, 
EGC, and EGCG, were quantified in Fi progeny and both parents 
in two measurement years. Mean value, standard deviation, range, 
and coefficient of variation for each catechin are shown in Table 5. 
Overall, EGCG was the most abundant of catechin compounds. 
Significant differences were observed between the parents for all 
catechins, with the exception of EGCG in 2010. The male parent 
'BD' had higher content of EC and ECG than the female parent 
'YS', while the content of EGC and EGCG varied considerably 
between years. 

High levels of variation in catechins content were observed in Fj 
progeny (Table 5). The means of the population were towards the 
lower end of ranges for all catechin compounds, and each of them 
exhibited a continuous variation with transgressive segregation, 
indicating a polygenic inheritance (Figure S3). Pearson's correla- 
tions between catechin compounds were calculated in Table 6. 
The data show that the individual catechins correlated signifi- 
cantly with each other, with the strongest positive correlation 
between EC and ECG (r=0.61, P<0.001), and the strongest 
negative correlation between ECG and EGCG (r= —0.52, 
P<0.001). 

Identified QTLs 

A total of 25 QTLs associated with catechins content of tea 
plant were detected over two measurement years by both IM and 
rMQM analyses at the significant genome-wide threshold of 5 % 
(Table S3). Of these, nine stable QTLs were validated across years 
(Table 7), and mainly clustered into four genomic regions on 
LG03, LGl 1, LG12 and LG15 (Figure 1). The other QTLs, only 
detected in one year and therefore considered as putative QTLs, 
were located on LG02, LG03, LGIO, LG14 and LG15. The 
population variability explained by each QTL was predominandy 
at moderate-to-high levels and ranged from 2.4% to 71.0%, with 
an average of 17.7%. Thirteen major QTLs were identified for 
four traits in three main regions on LG02, LG03 and LGl 1 (Table 

53) . The total number of QTL for each trait varied from four (EC) 
to eight (ECG and EGC), while the total population variability 
explained by all QTLs for a trait ranged between 38.4% (EC 
2011) and 79.7% (ECG 2011). For most of QTLs, the additive 
effects from maternal parent 'YS' (al) were significant higher than 



those from paternal parent 'BD' (a2), indicating that the alleles for 
increasing catechins content at these loci were probably mainly 
contributed by the parents of 'YS'. 

QTL for EC content. Two major and stable QTLs, qEC3 
and qECll, were identified for EC content, accounting for a total 
of 41.7% and 38.4% of the population variability in 2010 and 
2011 respectively (Figure 1; Table 7). The qEC3 was located on 
LG03 close to markers TM376-TM546 with an average PVE of 
12.5% over the years, while qECll located on LGll close to 
markers TM623-TM586 with an average of 27.6% of the 
population variability. The two QTLs both showed positive 
additive effect al, indicating that the allele leading to an increase 
for EC was contributed by 'FD' (the female parent of 'YS'). The 
average increased effect was 4.66 and 7.21 mg respectively for 
qEC3 and qECll. 

QTL for ECG content. Four stable QTLs were detected for 
ECG content, collectively explained 79.6% and 79.7% of the total 
population variability in 2010 and 2011 respectively (Figure 1; 
Table 7). QTL distributed across four LGs, with a LOD score 
ranging from 4.66 to 55.63. The major QTL, qECGll, was 
located on LGl 1 close to marker TM586 and had the largest effect 
on ECG content, accounting for an average of 70.4% of the 
population variability over the years. The allele contributed to 
additive effect al at this locus came from 'FD' and increased ECG 
by 31.48 mg, on average over the years. The other three QTLs 
with small effects were detected on LG03, LGl 2 and LGl 5. These 
QTL (qECG3, qECG12 and qECGlS) explained an average of 
3.3, 2.5 and 3.6% of the population variability, respectively. 

QTL for EGC content. A total of eight QTLs were detected 
for EGC content, accounting for a total of 43.5% and 49.4% of 
the population variability in 2010 and 201 1 respectively (Figure 1; 
Table S3). Of these, two major and stable QTLs, qEGC3 and 
qEGCll, were mapped on LG03 and LGl 1, respectively (Table 7). 
The qEGC3 was located close to marker TM136-TM412 with an 
average PVE of 13.4% over the years, while the qEGCll was 
located close to marker TM623-TM435 with an average PVE of 
15.8%. The qEGC3 showed a positive al meaning that the allele 
for increasing EGC was from 'FD', whereas qEGCll showed a 
negative al indicating allele for EGC in 'YD'. The increased effect 
was 14.09 and 15.55 mg respectively for qEGC3 and qEGCll. An 
additional four putative QTLs were identified on LG02, LGIO, 
LGl 4 and LGl 5 (Table S3). The LOD score for these loci varied 
from 4.25 to 9.68, and the PVE ranged between 5.0% and 15.7%, 

QTL for EGCG content. One major and stable QTL, 
qEGCGll, was identified for EGCG content on LGll, close to 
marker TM435-TM586, contributing to 46.3% of the population 
variability on average over the years (Figure 1; Table 7). The 
additive effect al of the 'YD' allele at this locus increased EGCG 
on average by 40.63 mg. In addition, three putative QTLs was 
mapped on LG03 and LGIO, and explained between 2.4 and 
8.6% of the population variability (Table S3). 

Discussion 

Genie SSRs derived from the tea plant transcriptome 

In the past several years, large-scale sequencing and the 
apphcation of next-generation sequencing (NGS) technologies 
have exponentially increased the volume of in .silico databases of 
nucleotide sequences, from which SSR markers can be rapidly 
developed at low cost [57,58]. Although SSR mining is not usually 
the first priority of such sequencing projects, the enormous 
amounts of sequence data can be used for this purpose [59] . In this 
case, a total of 9,239 genie SSRs were identified based on the 
59,962 non-redundant unigene sequences generated from the 
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Figure 1. SSR-based genetic map of tea plant showing location of QTLs for catechins content identified in the 'YS'x'BD' mapping 
population. Map distance scales in cM are placed at left margin. iVIarkers with distorted segregation ratios are marked with asterisks according to 
their significance levels (*: 0.05, **: 0.01, 0.005, 0.001, 0.0005, 0.0001). Regions of linkage groups considered to be candidate 

segregation distortion region (SDR) are denoted by blue fill. Significant QTL for each catechin is represented by different color (EC, red; ECG, green; 
EGC, coffee; EGCG, pink). Bars and lines indicate 1-LOD and 2-LOD support intervals. The solid bars and lines indicate stable QTL (detected across 
years), and empty bars and dashed lines putative QTL (only detected in one year). EC epicatechin, ECG epicatechin gallate, EGC epigallocatechin, EGCG 
epigallocatechin gallate. 
doi:1 0.1 371 /journal.pone.00931 31 .gOOl 



transcriptome assembly in our previous study [44]. About 12.7% 
of the unigenes possess SSR loci. The average frequency of SSRs 
was about 1/3.98 kb, which is higher than that estimated by Wu 
et al. (1/4.99 kb) using a set of 25,637 tea plant unigenes [43]. 
This is probably because the search parameters used for SSR 
exploration, e.g. the number of repeat motifs unit, were quite 
different between the two studies. Dinucleotide repeat motifs were 
the predominant repeat type among the unigenes analyzed herein, 
accounting for 38.9% of the total SSR loci identified, followed by 
trinucleotide (27.2%), hexanucleotide (17.2%), pentanucleotide 
(10.7%), tetranucleotide (4.4%), and higher-orders (1.6%). These 
results are generally consistent with the fmdings of Taniguchi et al. 
[42] and Wu et al. [43]. Out of the 1141 newly designed SSR 



primer pairs, 831 (73%) could yield amplicons in the mapping 
parents, comprising of 361 (32%) monomorphic and 470 (41%) 
polymorphic marker loci (Table S4). The polymorphic ratio is 
similar to those obtained in previous studies in tea plant (ranging 
from 31% to 70%) [25,36,39,40,42,43]. Among the polymorphic 
SSRs, 328 loci were heterozygous in at least one parent with either 
two or three alleles, indicating that these loci can be used for 
linkage analysis in the present mapping population. The propor- 
tion of mappable SSR markers is comparable with that reported 
by Taniguchi et al. [25]. Overall, deep transcriptome sequencing 
in tea plant offers an excellent opportunity to quickly identify a 
large number of genie SSRs. 
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Table 5. Phenotypic variation of individual catechins content 
in parental lines {'YS' and 'BD') and 183 Fi progeny. 



Trait Year 



BD 



CV 

Mean ± SD Mean ± SD Mean ± SD Range (%) 



EC(mg/2010 9.46±0.09 19.33±0.11 12.50±3.61 6.55- 28.9 
g) 29.72 

2011 6.37±0.56 18.08±0.36 11.90±3.78 5.26- 31.8 

24.59 

ECG 2010 24.38±0.38 61.24±1.33 29.20±8.87 12.28- 30.4 
(mg/g) 57.04 

2011 23.56±0.45 58.63±1.54 31.90±9.80 17.64- 30.7 

65.81 

EGC 2010 36.72±0.60 34.61±1.06 34.79±11.95 7.91- 34.3 
(mg/g) 69.68 

2011 17.94±0.65 33.37±1.82 23.68±7.65 8.02- 32.3 

44.42 

EGCG 2010 160.96±2.59 139.88±2.71 124.86±15.09 90.17- 12.1 
(mg/g) 159.58 

2011 154.32±2.71 156.51 ±4.30 11 8.41 ±15.30 86.48- 12.9 

157.37 



EC epicatechin, ECG epicatechin gallate, EGC epigallocatechin, EGCG 
epigallocatechin gallate, SD standard deviation, CV coefficient of variation. 
doi:1 0.1 371/journal.pone.0093131 .t005 



Genetic map of tea plant 

A saturated genetic map can be a valuable tool for genetic 
research and molecular breeding. The first genetic map in tea 
plant was reported by Tanaka et al. [19] in a 'Yabukita' x'Shizuka- 
Inzatsu 131' pseudo-testcross population with randomly amplified 
polymorphic DNA (RAPD) markers. But that map contained only 
six linkage groups and very small numbers of markers. Later, 
several genetic maps of tea plant were constructed based on 
dominant marker systems, including RAPD, inter-simple sequence 
repeats (ISSR), and amplified fragment length polymorphisms 
(AFLP), with a total coverage from 1,180 to 2,545 cM [20-24]. 
However, the numbers of mapped markers on these linkage maps 
were still relatively small, ranging from 62 to 208. Recently, a 
high-density reference linkage map has been developed based on a 
'Sayamakaori' x'Kana-CK17' tea plant population [25]. The 
combined map contain 441 SSRs, 7 CAPS, 2 STS and 674 
RAPDs, and the numbers of linkage groups are coincided with the 
basic number of chromosomes in tea plant (n=15). But the 
mapping population used in that case is relatively small, and there 
are several large gaps between adjacent markers in some linkage 
groups which may decrease the power of QTL detection and affect 
the accuracy of QTL effect estimation for QTLs inside or near the 
gaps. Therefore, more genetic maps need to be constructed using 
much bigger mapping populations. 

In this study, we developed a moderately saturated genetic map 
using SSR markers, based on a population of 183 individuals 
derived from the cross of two C. sinensis varieties. The total map 
length was 1,143.5 cM with 406 markers, which is close to the 
estimates for tea plant genome reported by Ota and Tanaka [20] 
(1640 cM), Hackett et al. [21] (1349.7 cM), and Taniguchi et al. 
[25] (1,298 and 1,305 cM), but considerably smaller than those 
obtained by Huang et al. [22] (2,457.7 and 2,545.3 cM), and Hu 
et al. [26] (4,482.9 cM). Although an increase in total map length 
would generally mean more genome coverage, a direct compar- 
ison of these maps was not possible. Because the accuracy of the 
genetic distance estimates was determined by several factors, 



Table 6. Pearson's correlations between the individual 
catechin compounds. 









EC 


ECG 


EGC 


ECG 


0.61*** 






EGC 


0.36*** 


_0 4o»»» 




EGCG 


-0.47*** 


-0.52»»» 


0.23** 





EC epicatechin, fCG epicatecliin gallate, fGC epigallocatechin, fGCG 

epigallocatechin gallate. 

**P<0.01, 

***P<0.001. 

doi:1 0.1 371/journal.pone.0093131 .t006 



including size and type of mapping populations, number and type 
of mapped marker loci, mapping strategies, statistical algorithm 
and computer packages [60-64] . 

The average interval between two markers was 2.9 cM in the 
current genetic map, which is comparable with the marker 
densities for historic maps (ranging from 1.9 to 19.0 cM) [19-26], 
In addition, the mapped markers were generally uniform 
distributed, and all linkage groups showed similar marker density, 
ranging from 2.2 to 3.9 cM (Table 4). The recommended marker 
density for genome-wide QTL mapping is a mean inter-marker 
interval of less than 10 cM [65,66]. Thus the genetic map 
constructed in the present study is suitable for QTL identification. 
However, there was still one gap of larger than 20 cM between 
adjacent markers on LG04. The presence of large gaps may be 
explained in three ways. Firstly, the genie SSR markers were 
derived from the transcriptome sequences, indicating that the 
genie marker based linkage maps represent only expressed regions 
of the genome (mostly the euchromatic regions), and thus 
heterochromatin and other repeat regions may be underrepre- 
sented, leading to large physical gaps between genie markers 
spanning these regions [66]. Secondly, the genome regions, 
corresponding to the gap regions of genetic map, are homozygous 
in both mapping parents; hence no recombination can be detected 
in this case. Thirdly, the individuals in the mapping population 
were not enough to observe the recombination in the gap regions. 
Presence of large gaps in the genetic map may lead to failure in 
detection of QTLs in these regions. Thus further studies will be 
needed to fiU this gap in the present genetic map. 

In order to establish the relationships with the reference map 
[25], we selected 45 anchor markers from the reference map for 
linkage mapping. Finally, 26 of these markers were mapped into 
the present genetic map, distributing on 15 linkage groups 
(Figure 1). According to these anchor loci, the homologous LGs 
of two maps were validated, and showed a good coUinearity 
(Figure S2). With a total genetic distance of 1,143.5 cM, the 
present map is almost the same length as the reference map 
(1,212 cM), indicating a considerable level of genome coverage. In 
consequence this new genie linkage map will provide a foundation 
on which a wide variety of genomic and genetic research can be 
buUt, facilitating molecular breeding of tea plant. 

Segregation distortion 

Segregation distortion is a phenomenon that genotypic 
frequencies of a locus deviate from the expected Mendelian ratios 
[67] and has been described in many species, such as maize [68], 
wheat [69], mungbean [70], Eucalyptus [71] and Populus [72]. In tea 
plant, SD is a feature of most mapping populations [21-24,26]. In 
this case, a total of 30.7 percent of the tested markers were 
significant distorted (P<0.05). This rate is in the range of those 
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Table 7. Overview of the stable (detected across years) QTLs for catechins content detected in the 'YS'x'BD' tea plant population. 











Positon 


LOD 


LOD 




Nearest 








Trait 


QTL 


LG 


Year 


(cM ± 2-LOD) 


threshold' 


score 


PVE (%) 


marker 


al" 






EC (mg/g) 


qEC3 


3 


2010 


44.0 (37.0-51.7) 


4.2 


6.39 


10.5 


TM376 


4.34 


0.63 


1.68 








2011 


44.7 (41.3-51.7) 


4.3 


7.88 


14.5 


TM546 


4.97 


2.54 


1.65 




qECll 


n 


2010 


4.8 (3.0-9.8) 


4.2 


14.86 


31.2 


TM623 


7.28 


-4.33 


-2.64 








2011 


3.0 (0-4.8) 


4.3 


10.52 


23.9 


TM586 


7.15 


-1.89 


-2.48 


ECG (mg/g) 


qECG3 


3 


2010 


77.9 (66.2-81.4) 


4.2 


5.09 


2.6 


TM560 


-3.14 


5.55 


0.34 








2011 


65.4 (32.0-72.9) 


4.2 


7.2 


3.9 


TM453 


-3.42 


5.87 


2.46 




qECGIl 


11 


2010 


3.0 (3.0-4.8) 


4.2 


52.85 


71.0 


TM586 


29.68 


-10.79 


-6.35 








2011 


3.0 (3.0-4.8) 


4.2 


55.63 


69.7 


TM586 


33.29 


-8.32 


-6.62 




q ECG 12 


12 


2010 


64.6 (54.4-67.3) 


4.2 


4.66 


2.5 


TM340 


4.71 


0.80 


2.53 








2011 


59.4 (45.0-67.3) 


4.2 


4.81 


2.4 


TM138 


3.88 


2.71 


3.65 




q ECG 15 


15 


2010 


13.6 (2.3-13.7) 


4.2 


7.17 


3.5 


TM399 


-1.30 


-7.09 


2.38 








2011 


13.6 (2.3-19.2) 


4.2 


7.93 


3.7 


TM399 


0.54 


-7.94 


1.03 


EGC (mg/g) 


qEGC3 


3 


2010 


50.5 (41.3-58.0) 


4.3 


7.25 


11.9 


TM136 


16.72 


1.18 


1.80 








2011 


54.4 (49.8-58.0) 


4.2 


9.26 


14.9 


TM412 


11.45 


4.77 


2.03 




qEGCl 1 


11 


2010 


4.8 (3.0-16.6) 


4.3 


10.54 


19.3 


TM623 


-20.30 


-2.25 


-2.89 








2011 


0 (0-4.8) 


4.2 


7.21 


12.3 


TM435 


-10.79 


1.56 


3.27 


EGCG (mg/g) 


qEGCGU 


11 


2010 


0 (0-4.8) 


4.3 


17.9 


40.1 


TM435 


-38.59 


2.77 


3.26 








2011 


0 (0-4.8) 


4.3 


29.13 


52.4 


TM586 


-42.67 


3.78 


3.64 



EC epicatechin, ECG epicatechin gallate, EGC epigallocatechin, EGCG epigallocatechin gallate, LOD logarithm of odds ratio, PVE phenotypic variation explained. 
^The genome-wide LOD significance thresholds (P<0.05) based on permutation testing (n = 1000). 

"^aland a2 represent the additive (or average allele substitution) effects from maternal parent and paternal parent, respectively. 

'^The overall dominance effects. 

doi:10.1371/journal.pone.0093131.t007 



previously obtained in tea plant (ranging from 12.0-32.9%) [21- 
24,26]. It has already been proven that the markers with 
significant SD have slight impact on map order or length 
[73,74]; hence the distorted markers were usually discarded in 
subsequent linkage analyses. However, in order to better 
understand the reason for SD in the current mapping population, 
we still used the skewed markers for linkage mapping, with the 
exception of those greatly affecting marker orders or excessively 
changing linkage distances. Finally, 94 of the skewed markers were 
mapped onto the current genetic map, distributing in twelve 
linkage groups. 

There are several factors that contribute to SD, including non- 
biological factors (sample size and genotyping errors), and 
biological factors (gametic and/or zygotic selection) [69]. The 
percentage of markers with SD caused by the former factors is 
variable, while SD caused by the latter are generally beheved to be 
related to genes that are subject to direct (gametic) selection [75] . 
In this case, the results of allelic and zygotic tests showed that most 
of the mapped skewed markers displayed allelic SD but not zygotic 
SD, suggesting that gametic selection may be the underlying 
reason for genotypic SD in the present mapping population. And 
the five candidate SDRs also suggest that their corresponding 
regions in the tea plant genome may contain genes that affect 
viability or fitness. However, a more in-depth study may be 
required to further investigate this possibility. 

QTLs for catechins content 

The dissection of QTL for important agronomic traits and 
validation of marker-trait associations can facilitate selection of 
plants with desired features at early stages of growth. This is 
particularly valuable for woody plants such as tea plant, because 



the breeding program based on conventional phenotypic selection 
in these plants is often delayed due to their long juvenile stage. 
Catechins are one of the most important chemical components of 
tea leaves, and can greatly affect the tea quality. However, the 
genetic basis of this phenotype remains poorly understood. In the 
present study, we firstly reported the mapping of QTLs for 
catechins content in tea plant. In total, 25 tea catechins QTLs 
were detected, and the population variability explained by each 
QTL varied from 2.4% to 71.0%, with an average of 17.7%. The 
high level of PVE exhibited by these QTLs suggests that the 
catechins content may be controlled by only a few critical genes. 
However, the relative small size of mapping population used 
herein may lead to the overestimation of QTL effects and the 
decrease of statistical power for detection of QTLs with smaller 
effects [76] . Thus for further investigation, we need an increase in 
the size of our mapping population for a better estimation of QTL 
location and effect. 

The accuracy of locating QTL is also strongly influenced by 
environmental effects, because some environment-specific QTLs 
may express differendy in different environments [76,77]. These 
QTLs can be difficult or impossible to use in breeding for 
improvement of functional traits. In this case, seven QTLs were 
detected to be significant in only one growing year, while nine 
stable QTLs were validated across two years. And among these 
stable QTLs, six had major effect on catechins content, including 
at least one major QTL per trait. More interestingly, the major 
QTLs detected showed a marked tendency to co-localize, 
clustering on LG03 and LGll. It is particularly prominent in 
the region between 0 and 16.6 cM of LGll, where four major 
QTLs are located, one per trait. This chromosomal region 
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probably contains multifunctional genes associated with catechins 
production and accumulation, and deserves further investigation. 

Supporting Information 

Figure SI HPLC chromatograms of (A) catechin stan- 
dard mixture, (B) tea catechins extracted from a typical 
tea sample (THB-010). 

(TIF) 

Figure S2 Comparison between the homologous linkage 
groups (LGs) of the 'YS'x'BD' genetic map and the 
reference map of tea plant. The reference map is listed on the 
right in each LG, representing as LG01_Core to LG15_Core 
(published by Taniguchi et al. [25]). Map distance scales in 
Kosambi centimorgans (cM) are placed at left margin. Lines 
connect anchor markers. 
(TIF) 

Figure S3 Frequency distribution patterns of catechins 
content in Fi population derived from the cross between 
'YS' and 'BD'. Parental values are indicated with arrows. 
(TIF) 

Table SI Primer sequences and characteristics of the 
novel genie SSR markers developed from the transcrip- 
tome of tea plant. 

(PDF) 

Table S2 Marker names, linkage groups, segregation 
types, segregation distortion (SD) patterns, and p values 
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